Skip to content

Conversation

@teofr
Copy link
Contributor

@teofr teofr commented Jan 28, 2026

Changes to V2 language definition, the main reason is to facilitate creating an LR(1) parser. The more complex ones are:

  • Changes to TupleDeconstructionStatement, making it more strict and with a clear separation between var style declarations and explicit ones.
  • Changes to the IdentifierPath, due to making address a reserved keyword.

For another PR/discussion, we considered merging TupleDeconstructionStatement and VariableDeclaration, to merge all variable declarations together, however I think this will look a bit artificial since their shape is quite different. We can still force it if we consider there's value in it, but I think not worth it for now; they'll probably be joined in one of the passes simplifying the ast.

@teofr teofr requested review from a team as code owners January 28, 2026 10:12
@changeset-bot
Copy link

changeset-bot bot commented Jan 28, 2026

⚠️ No Changeset found

Latest commit: c14dcbf

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions/suggestions. Thanks!

- `HexLiteral` and `YulHexLiteral` and `DecimalLiteral` and `YulDecimalLiteral`:
- It was illegal for them to be followed by `IdentifierStart`. Now we will produce two separate tokens rather than rejecting it.

## Language Definition Changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming this section to Grammar, since the rest of the doc also lists language definition changes:

## Grammar


### IdentifierPath

Changed from a simple `Separated` list to a structured format to allow the reserved `address` keyword to appear in identifier paths (but not as the head):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we are able to just use Separated(MemberAccessIdentifier, Period) for simplicity? We won't need the extra type, given how commmon IdentifierPath is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, WDYT of IdentifierPathElement instead of MemberAccessIdentifier? the latter conflicts with the fact that one of its two variants is no longer an identifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that'd be cleaner. Potentially that could introduce some ambiguity, but probably solvable by writing the parser rule by hand. I'll go down that way

Comment on lines +68 to +71
The cases where using empty tuples are still ambiguous, `(,,,) = ...` can still be a `TupleDeconstructionStatement` or a
an `AssignmentExpression` with a `TupleExpression` on the lhs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can still be a TupleDeconstructionStatement or a an AssignmentExpression

Which one? I wonder if we have existing cst_output tests for this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how about var (,,,,)? this is legal AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one?

This is a tricky question since there's still no parser on V2, I'll try to answer it:

  • Right now the v2 language definition (with these changes) is still ambiguous, so in theory parsing either one of those options is correct.
  • If we choose to do like the V1 and give priority to definitions higher up the definition.rs file, then they'd be parsed as TupleDeconstructionStatement
  • Once the V2 parser is done, we need to handle this ambiguity and choose one, I'd go for those cases being an AssignmentExpression, since they're not declaring a variable at all.
  • Right now there's no way to express "a separated item, where every item is optional but has to appear at least once" in the language definition DSL, so it makes it difficult to express this within the language DSL
    • We could separate it into a prefix of empty tuple (ie (,,,,), then an element that must be there (ie bool a), and then a postfix of possible empty tuple elements (, bool b, , ,)); but that would make a parsing problem make the CST and general API worse.
  • Also, I just checked solc doesn't seem to allow empty tuples at all (ie (,,,) = ... or () = ...) after 0.5.0, so maybe we need to validate this after parsing.

I wonder if we have existing cst_output tests for this case?

We have some cases, I added a few more (they're only testing V1 for now)

Also, how about var (,,,,)? this is legal AFAICT.

This is legal with the new definition as well, since the elements of the Separated (UntypedTupleDeconstructionElement) have an optional identifier as its field:

Struct(
    name = UntypedTupleDeconstructionElement,
    enabled = Till("0.5.0"),
    fields = (name = Optional(reference = Identifier))
),


This makes certain cases that were allowed before disallowed in V2, in particular having untyped declarations (like `(a, bool b) = ...`)
or having typed together with `var` (like `var (a, bool b) = ...`).
The cases where using empty tuples are still ambiguous, `(,,,) = ...` can still be a `TupleDeconstructionStatement` or a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For another PR/discussion, we considered merging TupleDeconstructionStatement and VariableDeclaration, to merge all variable declarations together, however I think this will look a bit artificial since their shape is quite different.

I think having this distinction (declaring new vars VS assigning to existing ones) is worth the destinction, both syntactically and semantically. WDYT of having the following structure, if it works with LALR?

  • use VariableDeclarationStatement for any syntax that declares a new name:
    • var x = ... already supported
    • int x = ... already supported
    • change VariableDeclarationStatement::name field to an enum with two variants:
      • name: Identifier -> existing
      • elements -> a struct holding LeftParen + Separated(elements) + RightParen
  • use AssignmentExpression for any syntax that just assigns values to the LHS:
    • x = ....
    • (x, y) = ....
    • (,,,) = ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having this distinction (...) is worth the destinction

I completely agree

The problem I see with the proposed structure is that currently the int and the var in the first 2 cases are captured by the same definition, so you could end up with a language accepting something like int (a, b, c) = ....

But also, since you want to allow for the elements of the tuple to have the type within it, you'd maybe want to make the var/int struct optional, to allow (bool a, uint b) = ..., but that would also parse x = ... as valid.

Coming from a perspective of appeasing LALRPOP, I'd say the distinction has to be a bit stronger, so allowing VariableDeclarationStatement to be an enum over:

  • SingleExplicitDeclaration: int x = ...
  • MultiExplicitDeclaration: (bool a, , int b) = ...
  • ImplicitDeclaration (until 0.5.0): var a = ... and var (a, , b) = ... (so this one would have the enum allowing either a single Identifier or a tuple of Identifier)

This could be simplified when lowering it to an AST

What do you think? I'll try to push a single commit with these changes so you can review them.

@teofr teofr force-pushed the teofr/node_checker branch from d95ed0d to db16409 Compare February 2, 2026 15:57
@teofr teofr force-pushed the teofr/v2-definition-changes branch from fb9d550 to c14dcbf Compare February 2, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants